This report looks is an update to the analysis shown on 1/14/2022. Most steps are the same with tweaking due to different data and outlier procedure

at exploring the relationship between wastewater and FirstConfirmed.Per100K. There are four components to this analysis.

  1. Removing putative outliers

  2. Binning analysis

  3. Smoothing signal

  4. Statistical analysis

This report does not present any final answers but presents some very convincing heuristics.

“data Used from DSIWastewater package”

Data: The first look

The two data sets used in this analysis are the Madison case data sourced from the Wisconsin DHS and wastewater concentration data produced by the Wisconsin State Laboratory of Hygiene. This wastewater data has entries every couple of days from 15 September 2020 to 30 November 2022.

site date conf_case FirstConfirmed.Per100K pastwk.avg.casesperday.Per100K sars_cov2_adj_load
Madison 2020-09-15 43 10.97457 NA 5.3527971
Madison 2020-09-19 108 27.56403 118.42857 1.2473200
Madison 2020-09-22 42 10.71934 98.42857 1.9444380
Madison 2020-09-23 95 24.24614 84.42857 1.6489902
Madison 2020-09-24 64 16.33424 73.57143 0.9406148
Madison 2020-09-25 66 16.84468 66.42857 0.6352719

The case data has a strong weekend effect so for this section we look at a seven day smoothing of FirstConfirmed.Per100K. The simple display of the data shows the core components of this story. First, wastewater data is noisy. And that there is a clear relationship between the two signals.

Wastewater concentration and daily Covid-19 case data for Madison. A seven day moving average of FirstConfirmed.Per100K is used to reduce a day of the week effect.

Removing potential outliers

Looking at the wastewater measurements we observe there were some points many times larger than adjacent values hinting at them being outliers. We used the adjacent 10 values on each side and marked points 2.5 standard deviations away from the group mean as outliers.

Wastewater concentration for Madison with potential outliers marked. Using a rolling symmetrical bin of 21 days as a sample we use 2.5 standard deviations of the bin as a metric to reject extreme points. This process is ran multiple times to get a robust process to select outliers.

Data smoothing

The goal in this section is to smooth the data to get a similar effect without losing resolution.

viral load smoothing

To get a good smoothing of the sars_cov2_adj_load measurement we employ loess smoothing. Loess smoothing takes a locally weighted sliding window using some number of points. we found the best smoothing when it uses data within approximately 0 weeks of both sides of the data. The displayed plot shows the visual power of this smoothing. We see in general that the smoothed N1 trails SLD. However loess is symmetric meaning that it can not be used in predictive modeling due to it using points from the future to smooth points.

Loess smoothed N1 and SLD FirstConfirmed.Per100K for Madison data. Using a Locally Weighted Scatterplot Smoothing process along with the previous figure SLD FirstConfirmed.Per100K we get the most sophisticated relationship between the two signals discussed in this document.

Towards a formal analysis

Cross correlation and Granger Causality are key components to formalize this analysis. Cross correlation looks at the correlation at a range of time shifts and Granger analysis performs a test for predictive power.

Max Cross Correlation Lag of largest Cross correlation P-value Wastewater predicts FirstConfirmed.Per100K P-value FirstConfirmed.Per100K predicts wastewater
Section 1: FirstConfirmed.Per100K vs sars_cov2_adj_load 0.5225 13 0.0269 0.0000
Section 1: 7 Day MA FirstConfirmed.Per100K vs sars_cov2_adj_load 0.5686 12 0.0181 0.0000
Section 2: FirstConfirmed.Per100K vs sars_cov2_adj_load 0.6058 10 0.1877 0.0001
Section 4.3: 7 Day MA FirstConfirmed.Per100K vs Loess smoothing of sars_cov2_adj_load 0.7095 8 0.0048 0.0396